All About Accountability / Those [Fill-in-the-Blank] Tests!

W. James Popham

Adjectives are such potent little words. In an instant, they can modify an unsuspecting noun, turning it into something righteous or wretched. And because one four-letter noun—namely, test—now functions as a lightning rod in our education lexicon, educators must be particularly cautious when we select adjectives to modify it.

Of course, educators use various kinds of tests, such as achievement tests and aptitude tests, as well as a galaxy of high-stakes, middle-stakes, and low-stakes tests. In truth, it's difficult these days for an educator to stumble across a truly no-stakes test. But certain education tests, especially in the current era of intensified education accountability, can influence the way teachers actually provide instruction for their students.

Accordingly, I want to draw your attention to key differences among three kinds of instructionally relevant tests that can have a huge impact on what goes on in classrooms:instructionally insensitive tests, instructionally sensitive tests, and instructionally informative tests. If educators understand the advantages and limitations of these three categories of tests, they can more effectively push for installation of the kinds of tests that truly improve instruction.

An instructionally insensitive test is essentially incapable of determining whether instruction was successful or unsuccessful. Test developers created many of these tests with a view to comparing test takers' scores with those of other test takers. Making those comparisons requires a substantial degree of spread in student scores; consequently, numerous items in these tests are linked to nicely spread out variables, such as student socioeconomic status. However, because such tests tend to measure what students bring to school rather than what they learn there, the tests turn out to be instructionally insensitive.

Unfortunately, the tests currently used in almost every state to satisfy the accountability provisions of No Child Left Behind (NCLB) are instructionally insensitive. Because NCLB's accountability strategy revolves around improving student test scores, using tests that offer teachers no clues about improving student achievement on those tests will clearly cripple NCLB's praiseworthy school improvement mission.

In contrast, an instructionally sensitive test will accurately reveal whether instruction has been terrific or tawdry. An instructionally sensitive test measures a modest number of important skills and bodies of knowledge and thus does not overwhelm teachers with a hoard of curricular aims too numerous to teach or test in the time available. Moreover, an instructionally sensitive test describes each assessed curricular aim well enough that teachers can focus their instructional energies on promoting student mastery of these aims instead of training students to come up with correct answers to a series of test items. Finally, student performance on these sorts of tests is reported at an instructionally suitable level of specificity so that teachers can make sensible instructional decisions.

If teachers provide reasonably effective instruction focused on students' attainment of the curricular aims measured by an instructionally sensitive test, student scores on that test will most likely improve. Conversely, the absence of improved scores on an instructionally sensitive test will reveal the presence of ineffective instruction. For NCLB to function properly, schools must use instructionally sensitive tests.

We rarely encounter the instructionally informative test these days. Despite tons of rhetoric about the “diagnostic” yield of today's tests, most achievement tests are far from diagnostic. For a test to be genuinely diagnostic, any student's weak performance on the test should provide tangible indicators to teachers (and, quite possibly, to the student) about the nature of that learner's specific weaknesses.

One straightforward strategy for creating an instructionally informative test is to base test items on a carefully detailed identification of the subskills and knowledge that students must master en route to attaining a more ultimate curricular aim. For example, for students to succeed in writing a compelling persuasive essay, they first need to learn how to organize the essay and recognize what sorts of persuasion ploys are apt to be most potent. Several decades ago, the identification of enabling subskills and knowledge was referred to as a task analysis. Currently, instructional psychologists describe these collections of carefully sequenced subskills and knowledge as learning progressions orprogress maps. Regardless of the label used, such analyses make it possible to build truly diagnostic tests.

To illustrate, test developers can construct multiple-choice items so that, in a given set of items, the wrong-answer options reveal specific student misunderstandings, such as transposed cause-and-effect relationships or particular categories of factual errors.

Similarly, for constructed-response items, test developers can create rubrics (scoring guides) that focus on a small number of instructionally addressable factors for evaluating student responses. Teachers use similar rubrics to evaluate students' written compositions, including such evaluative criteria as organization, content, mechanics, and voice. Rubrics such as these help both teachers and students identify instructional needs. By using instructionally sensitive tests that are also instructionally informative, educators will be able to see not only whetherinstruction is working, but alsowhy a student is having problems. At least one state, Wyoming, has recently required its external test contractor to create instructionally informative NCLB assessments.

Clearly, instructionally sensitive and instructionally informative tests are not likely to be built by psychometric specialists who have scant knowledge regarding the day-to-day demands of classroom teaching. Such tests must be fashioned using heavy input from educators.

First, however, more educators must learn to differentiate between tests that nurture nifty instruction and those that don't. Only when test-knowledgeable educators demand that instructionally insensitive tests be replaced at the very least by instructionally sensitive tests and, ideally, by instructionally informative tests will we actually see tests used effectively to improve student achievement. We don't need more testing. What we need, relying once more on the power of adjectives, is fewer badtests and moregood tests.

James "Jim" Popham (1930–2025) was Emeritus Professor in the UCLA Graduate School of Education and Information Studies. At UCLA he won several distinguished teaching awards, and in January 2000, he was recognized by UCLA Today as one of UCLA's top 20 professors of the 20th century.

Popham was a former president of the American Educational Research Association (AERA) and the founding editor of Educational Evaluation and Policy Analysis, an AERA quarterly journal.

He spent most of his career as a teacher and was the author of more than 90 books, 250 journal articles, 50 research reports, and nearly 200 papers presented before research societies. His contributions to education spanned decades, shaping how we think about student assessment and educational evaluation.

Learn More

ASCD is a community dedicated to educators' professional growth and well-being.

Let us help you put your vision into action.

Discover ASCD's Professional Learning Services

From our issue

Challenging the Status Quo

Go To Publication